In-memory OLAP aggregation on GPUs using CUDA Dynamic Parallelism

نویسندگان

Jérôme Meinke

Hannah Bast

Steffen Wittmer

چکیده

Most queries involved with Online Analytical Processing (OLAP) depend on the functionality of aggregating data along the multidimensional hierarchies of an OLAP cube. In real-time OLAP, aggregated data for interactive operations e.g. roll-up and drill-down is computed on-the-fly. Fast response times are essential and can be accelerated significantly through data-parallel computation on graphics processing units (GPUs). In this thesis, an existing parallel algorithm is modified to use a technology called CUDA Dynamic Parallelism (CDP). Using this technology, GPU programs can be launched directly from within other GPU programs to extract more parallelism. Furthermore, we present a preaggregation method using the CUDA shuffle command to optimize both GPU implementations. For evaluation purposes, we additionally implement a sequential aggregation algorithm. Our experiments show that the single-threaded CPU implementation is outperformed by the GPU implementations by 16 to 218 times. The experiments further show that the CDP implementation reaches a speedup of 3.72 times over the non-CDP implementation when processing queries for an artificial OLAP cube. However, using CDP causes an average of 1.42x slowdown to the processing of queries in a typical OLAP scenario.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dynamic Task Parallelism with a GPU Work-Stealing Runtime System

NVIDIA’s Compute Unified Device Architecture (CUDA) and its attached C/C++ based API went a long way towards making GPUs more accessible to mainstream programming. So far, the use of GPUs for high performance computing has been primarily restricted to data parallel applications, and with good reason. The high number of computational cores and high memory bandwidth supported by the device makes ...

متن کامل

Accelerating high-order WENO schemes using two heterogeneous GPUs

A double-GPU code is developed to accelerate WENO schemes. The test problem is a compressible viscous flow. The convective terms are discretized using third- to ninth-order WENO schemes and the viscous terms are discretized by the standard fourth-order central scheme. The code written in CUDA programming language is developed by modifying a single-GPU code. The OpenMP library is used for parall...

متن کامل

An approach to Improve Particle Swarm Optimization Algorithm Using CUDA

The time consumption in solving computationally heavy problems has always been a concern for computer programmers. Due to simplicity of its implementation, the PSO (Particle Swarm Optimization) is a suitable meta-heuristic algorithm for solving computationally heavy problems. However, despite the simplicity, the algorithm is inefficient for solving real computationally heavy problems but the pr...

متن کامل

Efficient Parallelization of Natural Language Applications using GPUs

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission....

متن کامل

Graph Generation on GPUs using Dynamic Memory Allocation

Complex networks are often studied using statistical measurements over many independently generated samples. Irregular data structures such as graphs that involve dynamical memory management and “pointer chasing” are an important class of application and have attracted recent interest in the form of the Graph500 benchmark formulation. The generation of simulated sample network graphs and measur...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

In-memory OLAP aggregation on GPUs using CUDA Dynamic Parallelism

نویسندگان

چکیده

منابع مشابه

Dynamic Task Parallelism with a GPU Work-Stealing Runtime System

Accelerating high-order WENO schemes using two heterogeneous GPUs

An approach to Improve Particle Swarm Optimization Algorithm Using CUDA

Efficient Parallelization of Natural Language Applications using GPUs

Graph Generation on GPUs using Dynamic Memory Allocation

عنوان ژورنال:

اشتراک گذاری